APPLYING NAIVE BAYES TO THE DATASET CREATED

CHECKING THE PERFORMANCE OF "BAG OF WORDS" DATA

BEST HYPERPARAMETER IS CONCLUDED FOR ALPHA AS "1" SINCE THE TRAINING AUC AND CV AUC ARE CLOSE TO EACH OTHER WITH HIGH ACCURACY

TEST AUC IS 70%

MODEL DID GOOD FOR "BAG OF WORDS"

""" NOW, CHECKING THE PERFORMANCE OF "TFIDF" DATA BY APPLYING NAIVE BAYES """

BEST PARAMETER FOR ALPHA IS PREDICTED TO BE "1"

TEST ACCURACY FOR TESTING DATA FOR TF-IDF VECTORIZED DATA IS 67%

""" TESTING ACCURACY FOR TF-IDF W2V DATA USING NAIVE BAYES"""

TEST ACCURACY IS 58%

""" RESULTS """

BEST ACCURACY IS OBTAINED FROM BOW FOR NAIVE BAYES WITH 68% ACCURACY

PLOTTING A CONFUSION PLOT

CONFUSION MATRIX FOR THE BEST ACCURACY OF NAIVE BAYES

APPLYING XGBOOST FOR THE GIVEN DATA

TEST ACCURACY OBTAINED FOR BAG OF WORDS DATA IS 66%

TESTING ACCURACY FOR TFIDF DATA

THE ACCURACY OBTAINED FOR TFIDF VECTORIZED DATA IS 67%

TESTING ACCURACY FOR TFIDFW2V DATA

THE ACCURACY OBTAINED FOR TFIDFW2V DATA IS 59%

CONFUSION MATRIX FOR HIGHEST ACCURACY FOR XGBoost TFIDF

APPLYING LOGISTIC REGRESSION FOR THE GIVEN DATA

Applying BAG OF WORDS vectorized data to LOGISTIC REGRESSION

ACCURACY OBTAINED FOR BOW FOR LOGISTIC REGRESSION 0.677 OR 68%

APPLYING TFIDF VECTORIZED DATA TO LOGISTIC REGRESSION

THE ACCURACY OBTAINED FOR TFIDF IS 0.728 OR 73%

THE ACCURACY OBTAINED FOR TFIDFW2V VECTORIZED DATA IS 0.59 or 60%

FINAL PROJECT SUMMARY

THE FINAL CONCLUSION IS "USING "TFIDF" AS THE VECTORIZER AND "LOGISTIC REGRESSION" WE GET THE HIGHEST ACCURACY OF PREDICTING THE DONOR CHOOSE PROBLEM"